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ABSTRACT 

Context. Morphology is the most accessible tracer of galaxies physical structure, but its interpretation in the framework of galaxy 
evolution still remains problematic. Its quantification at high redshift requires deep high-angular resolution imaging, which is why 
space data (HST) are usually employed. At z > 1, the HST visible cameras however probe the UV flux, which is dominated by the 
emission of young stars, which could bias the estimated morphologies towards late-type systems. 

Aims. In this paper we quantify the effects of this morphological k-correction at 1 < z < 2 by comparing morphologies measured in 
the K and 1-bands in the COSMOS area. The Ks-band data indeed have the advantage of probing old stellar populations in the rest 
frame for z < 2, enabling determination of galaxy morphological types unaffected by recent star formation. 

Mefhods. In paper 1 we presented a new non-parametric method of quantifying morphologies of galaxies on seeing-limited images 
based on support vector machines. Here we use this method to classify ~50000 Ks selected galaxies in the COSMOS area observed 
with WlRCam at CFHT We use a 10-dimensional volume, including 5 morphological parameters, and other characteristics of galaxies 
such as luminosity and redshift. The obtained classification is used to investigate the redshift distributions and number counts per 
morphological type up to z ~ 2 and to compare them to the results obtained with HST/ ACS in the 1-band on the same objects. We 
associate to every galaxy with Ks < 21.5 and z < 2 a probability between and 1 of being late-type or early-type. We use this value 
to assess the accuracy of our classification as a function of physical parameters of the galaxy and to correct for classification errors. 
Results. The classification is found to be reliable up to z ~ 2. The mean probability is p ~ 0.8. It decreases with redshift and with 
size, especially for the early-type population, but remains above p ~ 0.7. The classification globally agrees with the one obtained 
using HST/ ACS for z < 1. Above z ~ 1, the 1-band classification tends to find less early-type galaxies than the Ks-band one by a 
factor ~1.5, which might be a consequence of morphological k-correction effects. 

Conclusions. We argue therefore that studies based on 1-band HST/ ACS classifications at z > 1 could be underestimating the elliptical 
population. Using our method in a /f , < 21.5 magnitude-limited sample, we observe that the fraction of the early-type population is 
(21.9% ± 8%) at z ~ 1.5-2 and (32.0% ± 5%) at the present time. We will discuss the evolution of the fraction of galaxies in types 
from volume-limited samples in a forthcoming paper 

Key words, galaxies: fundamental parameters - galaxies: high redshift 



1. Introduction 

In the local Universe, the distribution of galaxies is bimodal, 
primarily reflecting a relationship between color and morphol- 



* Based on observations obtained at the Canada-France-Hawaii 
Telescope (CFHT) which is operated by the National Research Council 
of Canada, the Institut National des Sciences de I'Univers of the Centre 
National de la Recherche Scientifique of France, and the University of 
Hawaii. 



ogy. On the one hand the spiral-hke galaxies are gas-rich, form 
stars and are supported by the rotation of their stars and on 
the other hand the elliptical-like galaxies are gas-poor, do not 
form stars anymore and are supported by the velocity disper- 
sion of the stars. This is the so-called elliptical-spiral Hubble 
sequence. A fundamental question in observational cosmology 
is how this bimodality appears throughout the history of the 
Universe. Classical approaches to tackle thi s question consist in 
studying the evolution of the luminosity (e.g. lllbert et al. 2006bl) . 
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the star-forming rate or the mass assembly (e.g. lArnouts et al.l 
120071: iBundv et al1l2006l) for different morphological types. For 
that purpose, large samples of galaxies are required with a robust 
estimate of distances, luminosities and morphological types. 

However, the difficulty in quantifying morphology of high- 
redshift objects with a few simple, reliable measurements is 
still a major obstacle. The dependence on angular resolution 
and wavelength in fact turns the interpretation in terms of 
evolution difficult. To overcome these difficulties, astronomers 
have found alternative solutions such as classifying galaxies by 
spect ral type (M adgwi ck et al.ll2002h or by spectro-photometric 
type jZucca et al.ll2006l) . However, direct interpretation of these 
results in the framework of galaxy evolution is not straightfor- 
ward since galaxies move from one spectral class to another by 
a passive evolution of their stellar populations. A classification 
based on structural parameters is less sensitive to the star for- 
mation history, hence more robust to follow similar galaxies at 
different redshifts. 

In the visible, progress over the last ten years have come in 
particular from the Hubble Deep Fields (HDF) observed with the 
Hubble Space Telescope. They brought observational evidence 
that galaxy evolution is differentiated with respect to morpholog- 
ical type and that a large fraction of distant galaxies have pecu- 
liar morp hologies that do not fit in t o the elliptical-spiral Hubble 
sequen ce dBrinchmann et al.|[T998l: IWolf et alj|2003t lllbert et al] 
l2006bh . There is some evidence indeed that most of the stellar 
mass assembly occurs around z ~ 1.5-2 (e.g. [Abraham et aT] 
l2007h . A better understanding of the physical processes that lead 
to the present Hubble sequence should come therefore from ob- 
servations in this redshift range. In this context, near infrared ob- 
servations are particularly important because the Ks-band flux 
at z ~ 1 is less dependent on the recent history of star forma- 
tion, hich peaks in the rest-frame UV and thus gives a galaxy 
type from the distribution of old stars, more closely related to 
the underlying total mass than optical observations. A num- 
ber of Ks-band surveys have being carried out using ground- 
based telescopes with different spatial coverages and limiting 
magnitudes (e.g. ' Gardner et al.lll993l: iMcCracken et al.ll2000bt 
[Maihara et al. 2001]). In all cases the morphological information 
is poor because of the seeing-limited spatial resolution. Some 
morphological analyses have been performed using NICMOS 
on HST which is the only space instrument available in this 
wavelength range. All the results are on small areas involv- 
ing a few teris of g alaxies (e.g. Conselice & GNS Team 2007^; 
ISaracco et al.l2008l) or in clusters (e.g iZirm et al.l2008l) but there 
are no extensive morphological classi fications of field galaxies. 

In jHuertas-Companv et al.ll2008l hereafter Paper I) we pro- 
posed therefore a generalization of the non-parametric morpho- 
logical classification methods that uses an unlimited number of 
dimensions and non-linear separators, enabling us to use all the 
information brought by the different morphological parameters 
simultaneously. We showed that when applied to seeing limited 
data it reduces errors by more than a factor 2 compared to clas- 
sical non-parametric methods, leading to a mean accuracy of 
~80% of correct classifications. 

In this paper, we use this method to quantify the mor- 
phologies of ~50000 galaxies based on structural parameters 
measured in the near infrared. Galaxies are observed in the 
COSMOS field with WIRCam at CFHT in Ks-band. We per- 
form basic statistics such as redhisft and magnitude counts per 
morphological type and compare them to the ones obtained on 
the same objects using HST/ACS imaging (I-band) in order to 
quantify morphological k-correction effects. This classification 
is intended as a framework for future studies of the evolution of 



counts, luminosities, luminosity densities, correlation function 
for each morphological type over several redshift bins. 

The paper proceeds as follows: the data set and the sample 
selection are presented in the next section. In Sect. [3] we describe 
the technique used to quantify morphologies. In Sect.H) we an- 
alyze the results of the classification and show the first statis- 
tics. We finally compare the results with the ones obtained with 
HST/ACS in Sect. |5] and discuss the effects of morphological 
k-correction. 

We use the following cosmological parameters throughout 
the paper: Hq = 70km s"' Mpc"' and (QM,nA) = (0.3,0.7) and 
the AB system for magnitudes. 

2. The data 

2.1. Description 

The K s-band data were taken with WIRCam (iThibault et al.l 
I2003h installed at CFHT in the near infrared band {2.2^m). 
The o bserved area is 2.65 defi^ and is cen tered in the COSMOS 
field dScoville & COSMOS TeamI l2005h . Images are reduced 
with the Terapix pipelin43 and have a pixel scale of 0.15 with 
a mean FWHM of 0.7 . Figure [T] shows a cutout of the final 
reduced image. Data are complete up to Ks{AB) ~ 23.5. A more 
detailed description of the data set can be found in McCracken 
et al. 2008 (in preparation). 

The I-band data used in § [51 are part of the COSMOS 
HST/ACS field dKoekemoer et alT[2007l) . The data set consists 
of a contiguous 1.64 deg^ field covering the entire COSMOS 
field. The Advanced Camera for Surveys (ACS) together with 
the F814W filter ("Broad I") were employed. 

2.2. Building tlie Ks-band catalog 

2.2.1. Detection and cleaning 

All objects having a 1 .5cr signal above sky, over four contiguou s 
pixels are detected using SExtractor dBertin & Arnoutslll996l) . 
We then performed a cleaning task in order to separate galax- 
ies from stars and spurious detections. This was made using the 
SExtractor MU_MAX and MAG_AUTO parameters that give 
the peak surface brightness above the background and the Kron- 
like elliptical aperture magnitude, respectively. The distribution 
of objects in this parameter space clearly defines three regions 
that separate extended sources from point-like or non-resolved 
sources and from spurious detections. In this separation scheme, 
objects with very faint magnitude and high peak surface bright- 
ness are considered as false detections. The final reduced tiles 
have several strips where the noise is considerably higher than 
in the other regions of the image. Therefore, objects that fall in 
these regions were masked. Furthermore a mask was also ap- 
plied to the objects that are too close to bright stars {Ks < 15). 
The final area after having applied the mask is 2.08 deg^. We 
finally obtain, after cleaning and masking, 282 122 non-spurious 
sources over the whole field. 

2.2.2. Star/galaxy separation 

In order to select galaxies from the total K-selected photomet- 
ric sample, we have used a number of photometric parameters 
to remove candidate stars, as described below. Some of the pa- 
rameters that can be used are: (i) the CLASS_STAR parameter 
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Fig. 1. 3 x2 cutout of the observed area. Yellow circles are stars, 
magenta circles are masked objects, blue circles are galaxies and 
red circles are galaxies with computed morphology, (see text for 
details) 



given by SExtractor providing the "stellarity-index" for each 
object, (ii) the MU_CLASS, i.e. the position in the MU_MAX- 
MAG_AUTO plane described above, reliable up to ~ 20 (iii) 
the;^^- of the SED fitting carried out during the photometric red- 
shift estimate (Ilbert et al. 2006a), with templates SEDs of both 
stars and galaxies. 

We decided to use a combination of these criteria. For ob- 
jects brighter than Kg - 20, we selected as stars objects having 
a CLAS S TAR > 0.95 and a photometric redshift lower than 
0.05 or a stellar spectral class from the SED fitting in addition 
to the MU_CLASS parameter. For objects with Ks > 20 the 
MU -CLASS parameter was not used and the spectral class was 
only used when available. The final sample consists in 27 343 
point-like sources and 254 779 galaxies. 

2.2.3. The photometric redshift catalog 

Photometric redshifts are computed in lllbert et all (|2009|) using 
ax^ template-fitting method. They are computed with 30 broad, 
intermediate, and narrow bands covering the UV (GALEX), 
visible-NIR (SUBARU, CFHT, UKIRT and NOAO) and mid- 
IR (Spitzer/IRAC). Measurement were calibrated with large 
spectroscopic samples from VLT-VIMOS and Keck-DEIMOS. 
Comparison of the derived photo-z with 4148 spectroscopic red- 
shifts (A; - Zs-Zp) indicates a dispersion of crA„/(i+-j - 0.007 at 
Iab < 22.5 . For more details on how the redshifts are computed, 
please see lllbert et"aLl ( |2009|) . 

Our final catalog has 198684 objects with realiable photo- 
metric redshift measurements. 



2.2.4. The morphological catalog 

The morphological analysis is made in a subsample of the initial 
catalog: first we select only the galaxies which have a measured 
photometric redshift. Then we cut the catalog to Kg < 21.5 and 
Zphoi < 2. This decision is based in a visual inspection; objects 
fainter than 21.5 have a S/N per pixel lower than 5, so we decided 
not to include them in the morphological study. Simulations of 
those objects in fact show that the morphological classifications 
obtained are highly contaminated. Photom etric redshif t abov e 
z ~ 2 are not reliable enough according to lllbert et al] (|2009|) . 
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Fig. 2. Redshift distribution for the 44089 analyzed galaxies. 
Error bars are calculated using Poisson -v/n statistics. 



This selection results in a final morphological catalog of 44 089 
galaxies. Fig [2] shows the redshift distribution of the final cata- 
log. 

In order to verify precisely whether the sample is com- 
plete for all the morphological types and to quantify the selec- 
tion effects, we generated 5000 fake galaxies with exponential 
( Freeman! 1 19701) and de Vaucouleurs profiles dde VaucouleursI 
1 19481) of different morphological types (bulge fraction uni- 
formly distributed between and 1) and with galaxy sizes uni- 
formly distributed between and 1.5 and dropped them in 
the real background images. We then tried to detect them with 
SExtractor. Results are shown in Figure [3] As we can see, the 
sample is complete for all the bulge fraction values up to our 
magnitude limit (Ks - 21.5). We also looked at the complete- 
ness as a function of size. We generated for that purpose pure 
bulge and pure disk profiles with different sizes and detect them 
with SExtractor (Fig. |4]i. As expected bulges are detected up 
to fainter magnitudes, however the sample is complete up to 
Ks -21.5 for both bulges and disks for sizes ranging from to 
1.5". 



3. Morphology 

Galaxies in the catalog have been separated into two main mor- 
phological types (la te-type and early-type) using t he free avail- 
able code galS VlVrl dHuertas-Companv et al.l2007l) . By late-type 
we mean spiral and irregular galaxies and early-type galaxies in- 
clude elliptical and lenticular types. galS VM is a non-parametric 
N-dimensional code based on support vector machines (SVM) 
that uses a training set built from a local visually classified sam- 
ple. The employed procedure can be summarized in 4 main steps 
(see Paper I for more details): 

1. Build a training set: we select a nearby visually classified 
sample at wavelengths corresponding to the rest-frame of the 



^http : //www ■ lesia ■ obspm . fr/~huertas/galsvm. html | 
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Fig. 3. Completeness for extended sources as a function of the 
bulge fraction (B/T) as assessed from a mock sample of 5000 
galaxies dropped in the field images. 



high redshift sample to be analyzed. In our case, we want to 
simulate Ks-band observations. We used therefore an SDSS 
local sample observed in the i and z bands which roughly 
corresponds to the rest-frame wavelengths between z ~ I 
and z ~ 2 for Ks-band observations. We then move the sam- 
ple to the proper redshift and image quality and drop it in the 
real background. 

2. Measure a set of morphological parameters on the sample. 

3 . Train a support vector based learning machine with a fraction 
of the simulated sample and use the other fraction to test and 
estimate errors. 

4. Classify real data with the trained machine and correct for 
possible systematic errors detected in the testing step. 

3.1. The training sample 

The most important step in obtaining the morphology with a 
non-parametric method is to correctly calibrate the volume filled 
by the data in the multi-dimensional space. This is a critical step 
since it will determine the decision regions that will be used to 
perform the classification. Indeed, galaxy morphology deriva- 
tion depends on the physical properties of the galaxy (luminos- 
ity, redshift, wavelength) and on the observing conditions (back- 
ground level, resolution). A suitable calibration set should con- 
sequently reproduce closely all the properties of the sample to be 
analyzed. One classical approach consists in visually classifying 
a fraction of the sample and use it as a training set to optimize 
boundaries (Menanteau et al. 1999, 2006). However this is not 
possible for seeing limited data where the resolution is too low 
to enable a reliable visual classification. Here, we then decide to 
simulate the high redshift sample from a visually classified local 
catalog, selected in the rest-frame color of the high redshift sam- 
ple. This has three main advantages: first, it is less affected by 
K-correction eff'ects, second it does not introduce any modeling 
effect, since the used galaxies are real and finally, the training 
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Fig. 4. Completeness as a function of size for simulated disks (a) 
and bulges (b) as assessed from a mock sample of 5000 galaxies 
dropped in the field images. The size is represented by the disk 
scale length for disks (rj) and by the bulge effective radius (r^,) 
for bulges. 



set is built to reproduce the observing conditions and physical 
properties of the sample to be analyzed, but it is classified safely 
on well-resolved images, so it does not need to have a specially 
high resolution. 

We use a catalog of 1319 objects from the Sloan Digital Sky 
Survey ob served in two photom etric bands (z and i) and visually 
classified jTasca & Whitel20"05l) . As explained in paper I, for ev- 
ery galaxy stamp we first generate a random pair of (magnitude, 
redshift) values with a probability distribution that matches the 
real magnitude and redshift distribution of the sample to be sim- 
ulated and then we proceed in four steps: a) removal of fore- 
ground stars, b) degradation of the resolution according to a 
ACDM cosmology, c) binning to reach the desired pixel scale 
and d) dropping in a real background image. 

The photometric band (z or i) used to create the galaxy de- 
pends on the associated redshift. We choose the one that is closer 
to the rest-frame band at this given z. 

We did not take into account any variation of the PSF within 
the field for performing the simulations. We in fact expect this 
variation to be small and consequently not to induce strong 
changes in the morphology since the analyzed galaxies are sig- 
nificantly larger than the PSF. 

Among all the simulated objects, 450 were used as training 
and the remaining 869 as test. 
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3.2. Classification procedure 

We use 450 simulated galaxies as training sample. The morpho- 
logical mixing is fixed to 50/50, i.e. 50% of early-type galaxies 
and 50% of late-type. Even if this is not a realistic distribution 
it is required to minimize the errors in the SVM classification. 
The classification is made in a 10-D volume with a Radial Basis 
Function Kernel (see Paper! for more details). The measured 
parameters inc lude 6 m orphological parameters: Asymmetry, 
Concentration dConselice et al. 2000 a nd Ab raham et al. 1996i 
definitions), Gini (lAbraham et al.l2003h . M20 (iLotz et al.ll2004 . 
Smoothness dConseUce et alj 2003h . a distance parameter (pho- 
tometric redshift), a shape parameter (elongation), 2 luminosity 
parameters (surface brightness and magnitude). See Paper I for 
details on how these parameters are calculated. 

In Paper I we show, thanks to the test sample, that increas- 
ing the number of parameters of the classification never results 
in a degeneracy that decreases the accuracy. We realized how- 
ever that when dealing with the real sample, some parameters 
might produce a bias if they are not realistic. This is the case 
of the magnitude. Indeed when doing the simulations we use 
the same magnitude distribution for early and late-type galaxies. 
This creates an artificial excess of early-type galaxies at faint 
magnitudes in the simulated sample that is not seen in the real 
world. Therefore if the magnitude is used as input parameter for 
the classification the number of early-type galaxies are overesti- 
mated. To avoid this bias we proceed in 2 steps; first we make a 
9-D classification (without the magnitude) and then we use the 
measured magnitude distribution per morphological type to gen- 
erate a new simulated sample with this magnitude distribution. 
We use this second sample to make the final 10-D classification. 



3.3. Ttie output catalog 

Classical support vector classifiers only predict class label 
(i.e. early-type or late-t ype) but not probability in formation. 
Recendy some authors (iPlattI l2000l IWu et al.ll2004l) have pro- 
posed different methods to estimate a posteriori probability, 
i.e. given k classes of data, for any x the goal is to estimate 
Pi = p(y = i\x), i = 1, k. The free available package lib- 
SVM ([Chang & Linll200lb implements the method described in 
IWu et all (12001 7 Therefore we added this new feature to our 
classification output: we associate to every galaxy in the mor- 
phological catalog, a class label and a probability of belonging 
to the given class. Since we are dealing with a 2-class problem, 
the probability p(galaxy=early-type)=l-p(galaxy=late-type). In 
the following, we use this parameter to assess the accuracy of 
our classification. 



3.4. Accuracy 

As shown in paper I, one of the main advantages of the employed 
method is that the reliability of the classifications can be quanti- 
fied using a test sample simulated in the same way as the training 
one. 

Here we use the probability as the main estimator We first 
looked at the evolution of the correct classifications as a function 
of different probability thresholds. Results are shown in table[T] 
As we can see, there is a clear correlation between the probabil- 
ity threshold and the number of correct identifications: the accu- 
racy clearly increases when the considered probability is higher. 
If we select only objects with a probability between 0.5 and 0.6 
the mean accuracy is only around 58%. However, objects with 
probabilities greater than 0.8 are classified with nearly 90% ac- 



curacy. The contamination is around ~20% for the whole sample 
(p > 0.5). 

This relations between the probability parameter and the suc- 
cess rate enable the use of the probability as an estimate of the 
classification accuracy as a function of physical parameters of 
the galaxies directly on the real sample. We decided to represent 
2D maps of the mean probability values (Fig. |5]l for different 
magnitude, redshift and size bins. We observe several interest- 
ing trends: 

- Globally, late-type galaxies have higher probabilities for all 
redshift, magnitude and size values. This indicates that late- 
type objects are easily isolated. It is probably a consequence 
of the ellipticity parameter, used in the classification pro- 
cedure. Indeed, objects with high ellipticity are identified 
as late-type galaxies with high probability so that the mean 
probability increases. 

- As expected there is a clear trend with size for both morpho- 
logical types, small objects {rhnif < 0.6 ) have lower proba- 
bilities {p ~ 0.7) while large ones (rhaif > 0.6 ) have higher 
values (p > 0.8)(Fig.|5]c,d). 

- There is also a trend with redshift, especially for early-type 
objects: below z ~ 1 the mean probability is around p ~ 0.75 
and it decreases to p ~ 0.7 forz > 1. The number of galaxies 
with low probability (p ~ 0.6) also increases (Fig.|5]a,b). 

- Finally, the magnitude is also important for determining the 
quality of the classification. Above Kg ~ 20 the mean prob- 
ability for early-type objects is around p ~ 0.7 (Fig. |5]a,e). 
Interestingly, this trend does not appear for the late-type pop- 
ulation (Fig. |5]b,f). This is probably a consequence of the 
way the training sample is built (§ 13. 2t : at faint magnitudes, 
the number of early-type galaxies is low so the machine 
tends to find late-type galaxies with higher probability. 

How can we take into account these trends for correcting our 
classification and performing statistical analysis? The simplest 
way seems to establish probability thresholds but, what galaxies 
do we keep when selecting objects with a given probability or, in 
other words, do we introduce any bias when selecting a sample 
with a given probability cut? 

To answer this question we selected objects with 4 proba- 
bility thresholds (p > 0.6, p > 0.7, p > 0.8 and p > 0.9) and 
examined the completeness of the selected sample as a function 
of the magnitude, the redshift, the size and the morphological 
type. Figure |6] shows the results. If no biases were introduced 
all the lines should he flat (i.e "we keep all the galaxies in the 
same way"). We observe in Fig.|6]that this is not the case. This 
is seen in particular when looking at the completeness as a func- 
tion of size and morphological type: large objects are preferred 
as expected and early-type objects are more penalized. 

Therefore a sample selected by performing a probablity cut 
should not be used for global statistical analysis. It can be used 
however to select particular class of objects for which a very 
good accuracy is needed. In the next section we show however a 
way of using the information brought by probabilities for statis- 
tical purposes. 

3.4.1. Best estimator 

Considering that we have N galaxies with a probability pk of 
being in class A (e.g. early-type or late-type), we can define two 
functions to estimate the number of objects in a given class. 
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Table 1. Accuracy of the classifications based on the test sample for a 10 parameters SVM training (C, A, G, Ell, S, M20, C2, SB, 
z, mag) as a function of probability. The table shows the fraction of correct classifications when a probability bin [P„„„, Pmax] is 
considered. Numbers between brackets give the number of objects per bin. Top: early-type galaxies, bottom: late-type galaxies. 
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The first one (hereafter counts estimator) uses a function 

_ / if < 0.5 

* ~ I 1 if pk > 0.5. 

Then the number of objects in class A is simply: A^^ = ^Zj. 
Objects with a probability greater than 0.5 of belonging to a 
given class are simply added. With this estimator we consider 
therefore in the same way a galaxy with p - 0.5 1 and one with 
p - 0.99; it is just added to the corresponding class and the 
probability is ignored. 

The second one (hereafter probability estimator) tries to 
make use of the information contained in the probability param- 
eter. It has been shown in the previous section that galaxies with 
higher probabilities have lower classification errors. However 
making arbitrary probability cuts introduces biases in the global 
statistics in the sense that not all the objects are removed uni- 
formly. It would be interesting therefore to find a way of using 
the information contained in this parameter without introducing 
biases. For that purpose, we define a random variable 

_ J with a probability 1 - pk 

* ~ 1 1 with a probability pk- 

Then we can estimate the number of objects in class A as the 
mathematical expectation of this variable: E{Y) - 2 E{Yk) - 
2 Pk and the 1-cr error on the number as the square root of 
the variance of the variable: Var{Y) - E{X~) - E{X)~ - 
'Zj Pki^ - Pk)- This way, a galaxy with a probability p = 0.51 
of being in class A counts as 0.51 in this class but also as 0.49 in 
class B. 

Therefore, the larger the number of galaxies with probabil- 
ity values close to 0.5, the more the differences between the two 
estimators will be important. As a matter of fact, if all the prob- 
abilities are equal to one, both estimators give the same results 



whereas if all the probabilities are equal to 0.5, the probability 
estimator gives half the value of the count estimator In this sense 
the comparison of the results furnished by the two estimators is 
a kind of measure of the classification accuracy. 

3.4.2. Errors due to the training set 

The estimator presented above takes into account the informa- 
tion brought by the probability parameter to estimate at best 
the number of galaxies of each morphological type and correct 
from misclassifications. However, there is another source of er- 
ror which has to be considered. It is related to the training set 
itself. Indeed, even if the training sample is built to reproduce 
at best the parameters of the real one it can contain errors or 
there can exist galaxies which are not well represented in the pa- 
rameter space. In order to estimate these errors, we performed 
Monte Carlo simulations: we randomly removed elements from 
the training sample and generated therefore multiple training 
samples with fewer galaxies. We then used these samples to train 
different machines and classify the real sample. The generated 
samples have similar properties than the original one but do not 
fill the parameter space in the same way. They can be used conse- 
quently to estimate the effect in the final classification of missing 
objects in the parameter space. The differences found in the clas- 
sifications are then employed to estimate a kind of confidence 
region of our classification scheme in the following sections. 

4. Results and discussion 

4.1. Global statistics 

We find 33291(~75%) late-type galaxies and 10798 (-25%) 
early-type galaxies with the counts estimator and 30711. 7+1952 
(70%+5%) and 13376.3+2014 (30%+5%) early-type galaxies 
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Fig. 5. Bi-dimensional maps of the mean probabilities for different redshift, magnitude and size bins of the real sample. The size is 
represented by the half light radius as measured by SExtractor The plots show the mean probability distribution as a function of a 
single parameter Error bars show the dispersion of the probability distribution. On the left column we show the probability maps for 
early-type objects and on the right the maps for late-type objects. The black cells indicate that there are no objects in the considered 
bin. 



with the probability estimator In the next sections we try to lo- 
cate more precisely the differences. 

4.2. Number counts 

Figure Q shows the number counts per morphological type us- 
ing the two estimators described above. As expected, early-type 
objects dominate the bright end of the magnitude distribution 
while late-type objects are more frequent at the faint-end. The 



effect of using the probability estimator is seen clearly in the 
faint elliptical population (Ks > 20). In particular the number of 
faint early-type galaxies increases when using the probability es- 
timator This reflects that at faint magnitudes early-type galaxies 
are classified with lower probabilities as seen from the 2D maps 
(Fig: |5]). The use of the probability parameter in performing the 
counts enables to correct for the classification errors by adding 
galaxies that fell in the late-type class but had a significant prob- 
ability of being elliptical. 
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Fig. 6. Sample completeness as a function of redshift . Left column are early-type galaxies and right column late-type, (a)-(d) size 
(b)-(e) redshift and (c)-(f) magnitude for different probability cuts. 

Dotted line: galaxies with p > 0.6, dashed line: galaxies with p > 0.7, dashed-dotted line: galaxies with p > 0.8 and dashed line with three dots: 
galaxies with p > 0.9. Error bars are calculated using Poisson yfn statistics. 



4.3. Morphological evolution 

Figure [8] shows the morphological mixing evolution up to z ~ 2. 
Both estimators reveal an increase of the early type fraction from 
Z ~ 2, however the effect is less important when taking into 
account the probability. In fact, with the probability estimator 
we find 21.9%+8% early-type objects at z ~ 2 while the local 
fraction is 32.0%+5%. The counts estimator predicts a fraction 
of ~15% at z ~ 1.5. Considering the probability helps therefore 
to correct from the incompleteness of the early-type population 
at high z due to the lower probability values. 



This variation in the early-type population is a well-known 
effect which h as been detected using rest-frame morpholo gies 
from HST (e.g. lBrinchmann et al.ll998l:ICassata et al.ll2005h and 
probably reflects the building-up of the red sequence from late- 
type systems. It could be argued however that this effect was a 
consequence of morphological k-correction since at higher red- 
shifts we probe the UV galaxy emission, i.e. young stellar pop- 
ulations. The fact that we still observe this trend with NIR data 
(which probe older stellar populations) seems to point that this is 
a real effect and not a morphological k-correction effect. It is im- 
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Fig. 7. Number counts per morphological type for the 44 089 an- 
alyzed galaxies. The solid line shows the results obtained with 
the probability estimator; the line width indicates the 1 - cr con- 
fidence band estimated from the probability distribution and the 
dashed region is the confidence region deduced from Monte 
Carlo simulations of the training sample (see text for details). 
The dashed line shows the results obtained with the counts esti- 
mator. Red lines stand for early-type galaxies and blue lines for 
late-type. Error bars are calculated using Poisson V« statistics. 



portant to point out that the fraction found here at z ~ 1 .5 is sig- 



ig- 

nificantly higher tha n th e one obtained in Cassata et al.l (l2005l) : 
lArnouts et all ( |2007[) or [Abraham et alj ( l2007h . This short evo- 
lution could be explained by a selection effect. The K < 21.5 
selected sample does not probe the same galaxy populations at 
z ~ 1.5 and z ~ 0: at z ~ 1.5, only the high mass end of the 
K-band luminosity function is sampled whereas at z ~ 0, the LF 
is sampled over several magnitudes. This will be investigated in 
a forthcoming paper in which we will use stellar mass estimates 
to build a volume limited sample. In this paper we want to focus 
on the morphological k-correction effects and for that purpose a 
magnitude-limited sample is enough. The following sections are 
therefore focused on this important point. 

5. Investigating the morphological k-correction 
effect 

In the previous section, we have obtained a morphological clas- 
sification from NIR imaging. A crucial point is to understand the 
differences (if there are some) between morphologies quantified 
in the visible and the ones computed here. We perform for that 
purpose a match between the Ks selected morphological catalog 
and the morphologies measured from HST/ACS data (I-band) in 
an independent way. 

For the classification of the I-band sample we use a similar 
method as for the K-band sample: we train a 5-D support vec- 
tor machine (gini, concentration, asymmetry and ellipticity). As 
shown in Paper I, this is largely enough for a well-resolved sam- 
ple like the ACS one. Since ACS galaxies are better resolved 




2.0 



Fig. 8. Redshift distribution per morphological type for the 
44 089 analyzed galaxies. The solid line shows the results ob- 
tained with the probability estimator; the line width indicates 
I - cr confidence band estimated from the probability distribu- 
tion. Empty triangles show the results obtained with the counts 
estimator Dashed regions for both estimators are the confidence 
regions deduced from Monte Carlo simulations of the training 
sample (see text for details). Red lines stand for early-type galax- 
ies and blue lines for late-type. Error bars are calculated using 
Poisson yjn statistics. 



than WIRCam ones, the training sample is not built from an 
SDSS sample but by visually classifying 500 real ACS galaxies. 
The way the visual classification is performed is fully described 
in Tasca et al. 2008 (in preparation). 
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We match 32 171 objects of our K-selected sample. The 
remaining objects were too faint in the ACS data (/ > 24) 
to perform reliable morphological classifications. The global 
morphological mixing from HST/ACS data is in good agree- 
ment with the one obtained with WIRCam for the same ob- 
jects: 9163 (1-band), 10458.8 (Ks-band) early-type galaxies and 
25002 (I-band), 23706.2 (Ks-band) late-type galaxies respec- 
tively. However, where are the diiferences locahzed precisely? 

5.1. One-to-one comparison 

Figure |9] shows the mean probability for a galaxy classified as 
early (late) in the Ks-band to be classified as early (late) in the 
I-band as a function of redshift. 

We observe, that globally, there is no ambiguity in the identi- 
fication of late-type objects. A galaxy which is found to be late- 
type in the Ks-band is also a late-type system in the 1-band with 
a probability around p ~ 0.9 up to z ~ 2. 

For early-type objects, there is clear trend with redshift; be- 
low z ~ 0.8 a galaxy classified as early-type in the Ks-band has 
a probability of p ~ 0.8 of being early-type in the I-band. Above 
z ~ I the discrepancies between the two classifications become 
higher and an elliptical galaxy is classified as early in both clas- 
sifications only with a probabiUty of p ~ 0.4. Above z ~ I the 
I-band filter starts to probe the UV flux and therefore the ob- 
tained morphology is determined from young stars whereas the 
Ks-band filter stills probe the visible spectra of the galaxy. We 
are thus probably seeing a k-correction effect: late-type objects 
are identified in the same way since their stellar populations are 
younger whereas for early-type objects the ambiguity is higher 
and a fraction of objects tend to move to later morphological 
types. 

What are then the differences of morphologically selecting a 
sample in the Ks-band or in the I-band above z ~ 1 ? 

5.2. Comparing redsliift distributions 

Figure [10] shows the redshift distributions obtained from both 
classifications. The distributions for late-type objects present 
small differences as expected from the comparison performed in 
the previous section. Notice however that in the lowest redshift 
bin, the two classification find different values. This is proba- 
bly a consequence of classification errors since very big galaxies 
are not well represented in the training. A Kolmogorov-Smirnov 
test (KS-test) reveals that the two functions arise from the same 
statistic with 96% confidence. The early-type redshift distribu- 
tions present more differences, as expected from the compar- 
isons of the previous section, in particular above z ~ I. The 
match between the two distributions as computed from the KS- 
test is high at z < 1 (97%) but clearly decreases above z ~ I 
(67%). Interestingly, the I-band estimate goes out of the confi- 
dence region at z > 1 . We find in particular an excess of early- 
type galaxies by a factor ~ 1.5 in the Ks-band when compared to 
the I-band. Above z ~ 1 .5 the number of objects is small and the 
uncertainties high. Even if at z ~ 1.5, the Ks-band classification 
is less accurate (§ 13.4b . the difference is significant after taking 
into account most of the classification errors. The differences 
are probably due to morphological k-correction effects. It seems 
therefore that a morphological classification based on HST/ACS 
I-band data at those redshifts tends to under-estimate the ellipti- 
cal population. Nevertheless, these results have to be confirmed 
with a more precise study of the early-type population at z ~ 1 .5 
by studying their star forming histories and mass distributions. 
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Fig. 9. One-to-one comparison of the ACS and WIRCam classi- 
fications as a function of redshift. Red line: mean probability for 
a galaxy classified as early-type in the K-band to be classified 
as early-type in the I-band (HST). Blue line: mean probability 
for a galaxy classified as late-type in the K-band to be classified 
as late-type in the I-band. Error bars show the dispersion of the 
probability distribution. 

Another possible explanation for this result is the relative sizes of 
the PSFs between the two datasets: a typical galaxy in the ACS 
images has about fifty times more resolution elements across it 
than the K-band data (since the PSF sizes are 0.1" vs 0.7"), so 
the K-band data are better at pulling out lower surface brightness 
faint envelopes, which might be missed in the ACS data because 
the pixels are too small and there is too much noise. A test of 
this would be to convolve the ACS data to a PSF that matches 
the K-band data. We are planning this task in a forthcoming pa- 
per. 

6. Summary and conclusions 

We have presented a morphological classification in two main 
morphological types of 44089 galaxies within the COSMOS 
field from seeing limited near-infrared imaging. Morphologies 
are estimated with the non-parametric N-dimensional code 
galSVM using a 10 dimensional volume and non-linear bound- 
aries. 

The final output catalog includes for every galaxy, a class 
label (early or late) and a probability of belonging to the class. 
The probability is proved to be highly correlated with the suc- 
cess rate and can therefore be used to assess the accuracy of the 
classifications. 

This classification method has been used to obtain the 
number counts and the redshift distribution per morphological 
type up to z ~ 2 and to compare the results with the ones 
obtained from HST/ACS imaging on 32 171 galaxies in order to 
quantify morphological k-correction effects. 
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Fig. 10. Redshift distribution per morphological type for the 
same 32 171 Ks-selected galaxies observed in the I-band and in 
the Ks-band. Red and pink lines are WIRCam and ACS early- 
type objects respectively and blue and violet lines are WIRCam 
and ACS late-type. The line width is the 1 -cr confidence interval 
computed from the probability distribution, the dashed region is 
the confidence region deduced from Monte Carlo simulations 
of the training sample and the error bars are calculated using 
Poisson ^Jn statistics (see text for details). 



Our main conclusions are hereafter summarized: 

Concerning the reliability of our morphological classifica- 
tion: 



(i) According to the simulated test sample, the average success 
rate is ~80% for the whole sample for both morphological 
classes, leading to ~20% contaminations ( i.e. fraction of 
failures). 

(ii) The probability parameter which results from the classifica- 
tion procedure is a good estimator of the reliability of the 
classifications since it is highly correlated with the success 
rate. Objects with probabilities greater than 0.8 are identi- 
fied with nearly 90% of confidence. 

(iii) The study of the probability distributions as a function of the 
magnitude, the size and the redshift of the galaxies reveals 
that the most difficult class to isolate are faint {K^ > 20), 
small (log(r;,a//) < -0.2) early-type objects above z ~ I- 
However, even for this class of objects, the average probabil- 
ity is around p ~ 0.7. 

(iv) We also showed that selection based on a probability thresh- 
old does lead to a biased sample towards late-type systems. 
We proposed however a way of integrating the information 
brought by the probabiUty parameter to perform statistical 
analysis. 

(v) Errors due the training sample dominate the error source. 

We measure a global morphological mixing of 
30711.7+1952 (~70%+5%) late-type galaxies and 
13376.3±2014 (~30%±5%) early-type galaxies. There are 
~ 32% ± 5% early-type galaxies at z < 0.4 in our sample 
whereas the fraction at z ~ 1.5 is 21.9% + 8%. This effect 
persists even after correcting for classification errors using the 
probability parameter. The measurement qualitatively confirms 
the trend observed in previous studies as a consequence of 
a progressiv e building up of the red sequence from late-type 
objects (e.g. [Abraham et al.ll2007t lArnouts et al.ll2007l) . We will 
investigate the evolution of the fraction in types defined with 
our method, and using a volume limited sample rather than a 
magnitude limited sample, in a forthcoming paper. 

The comparison of the morphologies with the ones obtained 
obtained with HST/ACS in the I-band for 32 171 objects reveals 
several interesting trends: 

(i) The global morphological mixings are globally consistent. 
We find 9 163 early-type galaxies and 25 002 late-type galax- 
ies in the I-band and 10458 early-type for 23 706 late-type 
galaxies in the Ks-band. 

(ii) A galaxy classified as late-type in the I-band has a mean 
probability of p ~ 0.9 of being classified as late-type in the 
Ks-band. 

(iii) The match between the two photometric bands for early-type 
galaxies depends on redshift: Below z ~ 1, an early-type 
galaxy in the I-band has a probability of p ~ 0.7 of being 
early-type in the Ks-band. Above this redshift, where the 
HST cameras are probing the UV flux, the probability de- 
creases and reaches p ~ 0.4 at z ~ 1.5. 

(iv) The comparison of the redshift distributions reveals also a 
redshift dependence: below z ~ I the two redshift distribu- 
tions match quite well. The Kolmogorov-Smirnoff test gives 
a probability of match of p - 0.97. Above z ~ 1, the I- 
band classification tends to find less early-type galaxies than 
the Ks-band one by a factor ~ 1.5. This probably reflects a 
morphological k-correction effect. Therefore, studies based 
on HST classifications at those redshifts could underestimate 
the elliptical population. 

The results presented here quantify the bias in morphological 
classification due to morphological band shifting between I-band 
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and K-band imaging data. We estimate that the fraction of miss- Zucca, E., Ilbert, O., Baidelli, S., et al. 2006, A&A, 455, 879 
ing early type galaxies at z ~ 1 from I-band ACS data is about 
19%, increasing to 31% at z~1.5-2. From our K-band data we 
estimate that the fraction of early-type galaxies has increased by 
10% between z=1.5 and z=0, a further confirmation of the grad- 
ual build up of the elliptical galaxy population at the expense of 
late-type galaxies. The Ks-band classification performed here is 
intended as a framework for future studies of the evolution of 
counts, luminosities, luminosity densities and correlation func- 
tion for each morphological type over several redshift bins on a 
volume limited sample. 
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